One of the most important features of modern agent-based modeling approaches is the idea of heterogeneity of interactions.
Social, sexual, epidemiological, and communication networks exhibit rich local and global structure that influences the propagation of diseases, behaviors, practices, and beliefs through human societies.
Representing relationships between individuals has proven to be a challenge to quantitative and applied scientists.
In this section, we hope to provide a brief introduction to networks as they are formulated and used in models for public health applications.
Sometimes, instead of data about isolated units (humans, usually) and their attributes/outcomes, we have relational data about the connections between units.
Networks are data structures for representing relational information.
Some people think networks are transcendental data structures that can reveal many hidden truths about the human condition. I don’t, but I do think networks are very useful.
But there are a lot of pitfalls in analyzing network data. Many data analysis heuristics and existing statistical tools do not work for network data, and may give misleading results.
Mathematicians call networks graphs. A graph consists of: - a set \(V\) of vertices (nodes, subjects, units, egos) - a set \(E\) of edges (links, arcs, connections, relationships) \[ E = \{ \{i,j\} \text{ for } i,j\in V\} \]
\[V = \{1,2,3,4,5,6,7,8\}\] \[E = \{ \{1,2\}, \{2,3\}, \{2,4\}, \{2,7\}, \{3,7\}, \{4,5\}, \{5,7\}, \{5,6\}, \{6,8\}, \{7,8\} \}\]
Network (\(G\)) & Vertices (\(V\)) & Edges (\(E\))
| Social | people | friendships |
|---|---|---|
| Needle | needles | people |
| Publication | articles | citations |
| Publication | authors | co-authorship |
| Phylogenetic | species | ancestral relationship |
| Trade | countries | trade relationships |
| Gene expression | genes | interactions |
| Brain | cortical regions | fiber tracts |
| Neural | neurons | axons |
| Injection | people | needle sharing partners |
An undirected network has symmetric links. A directed network has directed links
Edges in an undirected graph are usually denoted as ordered pairs, e.g. \(E=\{ (i,j):\ i \to j\}\).
In a weighted network, each edge \(\{i,j\}\in E\) has a numeric attribute \(z_{ij}\).
Sometimes we define \(z_{ij}=0\) if \(\{i,j\}\notin E\).
Graphs can be
There are lots of other types of graphs!
The degree of a vertex is the number of edges incident to it: \[d_i = |\{j\in V:\ \{i,j\}\in E\}|\]
Vertex 7 has degree \(d_7=4\). Sometimes social scientists call this the egocentric network size or just network size.
For directed graphs, we distinguish between
The degree sequence of a graph is the sequence (sometimes ordered) of vertex degrees.
Degree sequence: \(\mathbf{d}=(1,4,2,2,3,2,4,2)\)
Does the degree sequence \(\mathbf{d}\) uniquely determine the topology of \(G\)?
The degree distribution is the frequency of each degree value in a graph.
Consider a graph \(G=(V,E)\).
Adjacency matrix: \(A_{ij} = 1\) if and only if \(\{i,j\}\in E\).
Visualizing a network requires a layout algorithm that tells your computer where to put each of the vertices relative to each other in two dimensions.
Layout algorithms try to place vertices in 2 dimensions so that:
I recommend:
www.hiveplot.com
Hu, Lu, Wu, “Spectrum-Based Network Visualization for Topology Analysis”, IEEE Computer Graphics and Applications 33, pages 58-68, 2013.
There are two main ways networks are divided into clusters/communities: - Hierarchical clustering iteratively combines (divides) vertices into sets based on their connectivity.
- Spectral clustering exploits algebraic (eigendecomposition) properties of the adjacency matrix \(A\) or the graph Laplacian \(L=D-A\).
Vertices and edges can have attributes, or observations associated with them.
- Let \(x_i\) be attributes of vertex \(i\in V\). - Let \(z_{ij}\) be attributes of the edge \(\{i,j\}\).
We might want to summarize node traits in the network by - degree assortativity: connections to nodes with similar degree - trait assortativity (homophily): connections to nodes with similar traits
Why do we need models for networks? Networks are complicated, and there are lots of them. By constructing models, learn about the structure of a complicated object using a small number of parameters.
One reason we need models is that there are a lot of networks:
Example For a graph \(G=(V,E)\) with \(n = |V| = 50\), there are approximately \(5.77\times 10^{368}\) possible undirected graphs and \(3.34\times 10^{737}\) directed graphs. There are about \(10^{80}\) atoms in the known universe.
This means that it can be prohibitively difficult to enumerate them for even modest \(n\).
So it is helpful to have a model with fewer parameters than \(2^{\binom{n}{2}}\) that helps us summarize networks.
A network model is a set of networks \(\mathcal{G}\) and a probability function \(\Pr(\cdot)\ge 0\) that maps networks to probabilities such that \[ \sum_{G\in\mathcal{G}} \Pr(G) = 1. \]
Probabilistic network models and parameter estimation: If you have a probability model \(\Pr(\cdot|\theta)\), where \(\theta\) is a set of parameters, and you observe a network \(G\), you might want to find the value of \(\theta\) that maximizes the probability of the network. In other words, \[ \hat\theta = \arg\max_{\theta} \Pr(G|\theta) \] This is called “maximum likelihood”.
There are many other ways to estimate parameters. %But the basic idea is to use an observed network \((G,X,Z)\) to tell you something about the model \(\Pr(G,X,Z|\theta)\) that might have generated that network.
Consider a graph of \(n=|V|\) vertices in which each edge exists independently with probability \(p\). Then the probability of observing a particular graph \(G=(V,E)\) is \[ \Pr(G|p) = p^{|E|} (1-p)^{\binom{n}{2} - |E|} \] Many analytic results can be derived for the ER model:
Is the ER model a good model for human social networks? Does the Erdos-Renyi model produce community structure?
A simple model for graph edges, given vertex attributes, is \[ \log\left[\frac{\Pr(i \sim j)}{\Pr(i\nsim j)}\right] = \alpha + \beta f(x_i,x_j) \] where \(\alpha\) is an intercept and \(f(x_i,x_j)\) is some (symmetric) function:
You should use this model to explain the formation of a network if:
An exponential random graph model (ERGM) has probability \[ \Pr(G|\theta) = \frac{\exp[\theta'h(G)]}{\kappa(\theta)} \] where \(h(G)\) is some function of \(G\) (called the ), and \[ \kappa(\theta) = \sum_{g\in \mathcal{G}} \exp[\theta'h(g)] \] is a normalizing constant that ensures these probabilities sum to 1.
Good things:
Examples of ERGM sufficient statistics
Model degeneracy occurs:
Rinaldo, Fienberg, Zhou. On the geometry of discrete exponential families with application to exponential random graph models. 3 (2009): 446-484.
Let \(h(G) = (\text{\#edges}, \text{\#triangles})\). Let \(\Pr(G|\theta) \propto \exp[\theta'h(G)]\).
Rinaldo, Fienberg, Zhou. On the geometry of discrete exponential families with application to exponential random graph models. Electronic Journal of Statistics 3 (2009): 446-484.
Exchangeable graphs have the property that their distribution is invariant to permutation of the vertex labels.
The stochastic block model (SBM) is a graph in which each vertex \(i\in V\) has a categorical attribute \(x_i\in\{1,\ldots,K\}\). The attribute \(x_i\) is called the “block” membership of \(i\).
Define a \(K\times K\) matrix \[ P = \begin{pmatrix} p_{11} & p_{12} & \cdots & p_{1K} \\ p_{21} & p_{22} & \cdots & p_{2K} \\ \vdots & \vdots & \ddots & \vdots \\ p_{K1} & p_{K2} & \cdots & p_{KK} \\ \end{pmatrix} \] that defines the edge probabilities between each of the \(K\) “blocks”.
An edge between \(i\) and \(j\) is formed with probability \(P_{x_i,x_j}\).
See also: degree-corrected SBM.
There are lots of other network models designed to capture important features of real-world networks.
One of the main features is a “heavy-tailed” degree distribution in which most vertices have small degree, and some have very large degree.
Further reading:
Dynamic or stochastic processes on networks are beyond the scope of this talk. However, you might be interested in further reading on:
Kolaczyk and Csárdi (2014) Kolaczyk (2009) Newman (2010) Jackson (2010)
Kolaczyk, Eric D. 2009. Statistical Analysis of Network Data: Methods and Models. Springer Science & Business Media.
Kolaczyk, Eric D, and Gábor Csárdi. 2014. Statistical Analysis of Network Data with R. Vol. 65. Springer.
Newman, Mark. 2010. Networks: An Introduction. Oxford university press.